EXECUTING NESTED PARALLEL LOOPS ON SHARED - MEMORYMULTIPROCESSORSSadun
نویسندگان
چکیده
Cache-coherent, bus-based shared-memory multiprocessors are a cost-eeective platform for parallel processing. In scientiic parallel applications, most of the computation involves processing of large multidimensional data structures which results in a high degree of data parallelism. This parallelism can be exploited in the form of nested parallel loops. Most existing shared memory multiprocessors exploit this multi-level parallelism at only one level. In this paper, we explore eecient algorithms and models for executing nested parallel loops and present a simulation based performance comparison of diierent techniques using real application traces. We show that it is possible to exploit the parallelism in nested parallel loops with the use of good scheduling and synchronization algorithms.
منابع مشابه
Executing Nested Parallel Loops on Shared-Memory Multiprocessors
Cache-coherent, bus-based shared-memory multiprocessors are a cost-e ective platform for parallel processing. In scienti c parallel applications, most of the computation involves processing of large multidimensional data structures which results in a high degree of data parallelism. This parallelism can be exploited in the form of nested parallel loops. Most existing shared memory multiprocesso...
متن کاملA hybrid scheme for efficiently executing nested loops on multiprocessors
Wang, C.-M. and S.-D. Wang, A hybrid scheme for efficiently executing nested loops on multiprocessors, Parallel Computing i 8 (! 992) 625-637. In this paper, we address the problem of scheduling parallel processors for efficiently executing nested loops. The goal is to achieve optimal load-balancing by using a few scheduling and cc, mmunication operations as possible. For this purpose, we propo...
متن کاملA Scheme for Detecting the Termination of a Parallel Loop Nest
One central problem in the execution of parallel nested loops with non-aane bounds is the precise scanning (i.e., enumeration) of the points in their iteration space and the detection of their termination. Scanning schemes have been proposed for both shared-memory and distributed-memory implementations. However, these schemes work only for perfectly nested while loops. We propose a scheme which...
متن کاملArchitectural and Software Support for Executing Numerical Applications on High Performance Computers By
Numerical applications require large amounts of computing power. Although shared memory multiprocessors provide a cost-e ective platform for parallel execution of numerical programs, parallel processing has not delivered the expected performance on these machines. There are two crucial steps in parallel execution of numerical applications: (1) e ective parallelization of an application and (2) ...
متن کاملSimple Code Generation for special UDLs
This paper focuses on transforming sequential perfectly nested loops into their equivalent parallel form. A special category of FOR nested loops is the uniform dependence loops (UDLs), which yield efficient parallelization techniques. An automatic code generation tool for shared and distributed memory machines, has been developed in order to automatically parallelize these perfectly nested loop...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1992